Approximating and testing k-histogram distributions in sub-linear time Citation

نویسندگان

  • Piotr Indyk
  • Reut Levi
  • Ronitt Rubinfeld
چکیده

A discrete distribution p, over [n], is a k-histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the `2 distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are -far from any k-histogram in the `1 distance and `2 distance respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning k-modal distributions via testing Citation

A k-modal probability distribution over the domain {1, ..., n} is one whose histogram has at most k “peaks” and “valleys.” Such distributions are natural generalizations of monotone (k = 0) and unimodal (k = 1) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of learning an unknown k-modal distribution. Th...

متن کامل

Approximating the Distributions of Singular Quadratic Expressions and their Ratios

Noncentral indefinite quadratic expressions in possibly non- singular normal vectors are represented in terms of the difference of two positive definite quadratic forms and an independently distributed linear combination of standard normal random variables. This result also ap- plies to quadratic forms in singular normal vectors for which no general representation is currently available. The ...

متن کامل

Histogram analysis- a useful tool for tissue characterization in brain CT

Introduction: Pixel value in computed tomography (CT) gives the average linear attenuation coefficient of the scanned material in the path of the x-ray beam, being normalized to that of water. It is known that attenuation coefficient or HU value is a function of the chemical characteristic of the material and of the x-ray energy. The CT image shows the HU value by a shade of gr...

متن کامل

Understanding and Improving Residual Distributions for Linear Poisson Models

The method of Linear Poisson Models (LPMs) is able to construct approximating linear models of histogram data behaviour based upon the assumption of independant Poisson noise. Crucially the method has been developed with associated techniques for the assessment of the effects of noise on both model construction and subsequent use of these models in quantitative data analysis. As a consequence i...

متن کامل

Near-Optimal Closeness Testing of Discrete Histogram Distributions

We investigate the problem of testing the equivalence between two discrete histograms. A k-histogram over [n] is a probability distribution that is piecewise constant over some set of k intervals over [n]. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two k-histogram distributions p, q over [n], we want to distinguish (with high probabi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012